Search CORE

53 research outputs found

A tetrahedral space-filling curve for non-conforming adaptive meshes

Author: Burstedde Carsten
Holke Johannes
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2016
Field of study

We introduce a space-filling curve for triangular and tetrahedral red-refinement that can be computed using bitwise interleaving operations similar to the well-known Z-order or Morton curve for cubical meshes. To store sufficient information for random access, we define a low-memory encoding using 10 bytes per triangle and 14 bytes per tetrahedron. We present algorithms that compute the parent, children, and face-neighbors of a mesh element in constant time, as well as the next and previous element in the space-filling curve and whether a given element is on the boundary of the root simplex or not. Our presentation concludes with a scalability demonstration that creates and adapts selected meshes on a large distributed-memory system.Comment: 33 pages, 12 figures, 8 table

arXiv.org e-Print Archive

Crossref

Juelich Shared Electronic Resources

Enhancing speed and scalability of the ParFlow simulation code

Author: Burstedde Carsten
Fonseca Jose A.
Kollet Stefan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/09/2017
Field of study

Regional hydrology studies are often supported by high resolution simulations of subsurface flow that require expensive and extensive computations. Efficient usage of the latest high performance parallel computing systems becomes a necessity. The simulation software ParFlow has been demonstrated to meet this requirement and shown to have excellent solver scalability for up to 16,384 processes. In the present work we show that the code requires further enhancements in order to fully take advantage of current petascale machines. We identify ParFlow's way of parallelization of the computational mesh as a central bottleneck. We propose to reorganize this subsystem using fast mesh partition algorithms provided by the parallel adaptive mesh refinement library p4est. We realize this in a minimally invasive manner by modifying selected parts of the code to reinterpret the existing mesh data structures. We evaluate the scaling performance of the modified version of ParFlow, demonstrating good weak and strong scaling up to 458k cores of the Juqueen supercomputer, and test an example application at large scale.Comment: The final publication is available at link.springer.co

arXiv.org e-Print Archive

Crossref

Juelich Shared Electronic Resources

scda: A Minimal, Serial-Equivalent Format for Parallel I/O

Author: Burstedde Carsten
Griesbach Tim
Publication venue
Publication date: 13/07/2023
Field of study

We specify a file-oriented data format suitable for parallel, partition-independent disk I/O. Here, a partition refers to a disjoint and ordered distribution of the data elements between one or more processes. The format is designed such that the file contents are invariant under linear (i. e., unpermuted), parallel repartition of the data prior to writing. The file contents are indistinguishable from writing in serial. In the same vein, the file can be read on any number of processes that agree on any partition of the number of elements stored. In addition to the format specification we propose an optional convention to implement transparent per-element data compression. The compressed data and metadata is layered inside ordinary format elements. Overall, we pay special attention to both human and machine readability. If pure ASCII data is written, or compressed data is reencoded to ASCII, the entire file including its header and sectioning metadata remains entirely in ASCII. If binary data is written, the metadata stays easy on the human eye. We refer to this format as scda. Conceptually, it lies one layer below and is oblivious to the definition of variables, the binary representation of numbers, considerations of endianness, and self-describing headers, which may all be specified on top of scda. The main purpose of the format is to abstract any parallelism and provide sufficient structure as a foundation for a generic and flexible archival and checkpoint/restart. A documented reference implementation is available as part of the general-purpose libsc free software library.Comment: 17 pages, 7 figures and 2 table

arXiv.org e-Print Archive

Recursive Algorithms for Distributed Forests of Octrees

Author: Burstedde Carsten
Ghattas Omar
Isaac Tobin
Wilcox Lucas C.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 18/11/2014
Field of study

The forest-of-octrees approach to parallel adaptive mesh refinement and coarsening (AMR) has recently been demonstrated in the context of a number of large-scale PDE-based applications. Although linear octrees, which store only leaf octants, have an underlying tree structure by definition, it is not often exploited in previously published mesh-related algorithms. This is because the branches are not explicitly stored, and because the topological relationships in meshes, such as the adjacency between cells, introduce dependencies that do not respect the octree hierarchy. In this work we combine hierarchical and topological relationships between octree branches to design efficient recursive algorithms. We present three important algorithms with recursive implementations. The first is a parallel search for leaves matching any of a set of multiple search criteria. The second is a ghost layer construction algorithm that handles arbitrarily refined octrees that are not covered by previous algorithms, which require a 2:1 condition between neighboring leaves. The third is a universal mesh topology iterator. This iterator visits every cell in a domain partition, as well as every interface (face, edge and corner) between these cells. The iterator calculates the local topological information for every interface that it visits, taking into account the nonconforming interfaces that increase the complexity of describing the local topology. To demonstrate the utility of the topology iterator, we use it to compute the numbering and encoding of higher-order

C^0

nodal basis functions. We analyze the complexity of the new recursive algorithms theoretically, and assess their performance, both in terms of single-processor efficiency and in terms of parallel scalability, demonstrating good weak and strong scaling up to 458k cores of the JUQUEEN supercomputer.Comment: 35 pages, 15 figures, 3 table

arXiv.org e-Print Archive

CiteSeerX

Crossref

Juelich Shared Electronic Resources

Calhoun, Institutional Archive of the Naval Postgraduate School